Method of creating a library of bacterial clones with varying levels of gene expression

ABSTRACT

The present invention relates to a method of creating DNA libraries that include an artificial promoter library and/or a modified ribosome binding site library and transforming bacterial host cells with the library to obtain a population of bacterial clones having a range of expression levels for a chromosomal gene of interest.

FIELD OF INVENTION

The present invention relates to the genetic modification of bacterialcells. Particularly to a method of creating DNA libraries that comprisea library of artificial promoters and/or a library of modifiedregulatory regions, and the use of the libraries to replace precursorpromoters and regulatory regions in bacterial host cells resulting in alibrary of bacterial clones having a range of expression levels of agene of interest.

BACKGROUND OF THE INVENTION

For many years microorganisms have been exploited in industrialapplications for the production of valuable commercial products, such asindustrial enzymes, hormones and antibodies. Despite the fact thatrecombinant DNA technology has been used in an attempt to increase theproductivity of these microorganisms, the use of metabolic geneticengineering to improve strain performance, particularly in industrialfermentations has been disappointing.

A common strategy used to increase microbial strain performance is toalter gene expression, and a number of means have been used to achievethis end. One approach includes the cloning of a heterologous or ahomologous gene in a multi-copy plasmid in a selected host strain.Another approach concerns altering chromosomal gene expression. This hasbeen accomplished by various methods some of which include: (1)site-specific mutations, deletions or insertions at a predeterminedregion of a chromosome; (2) reliance on transposons to insert DNArandomly into chromosomes and (3) altering of native regulatory regionsof a gene at its chromosomal location. The alteration of regulatoryregions can be accomplished for example, by changing promoter strengthor by using regulatable promoters which are influenced by inducerconcentration. Reference is made to Jensen and Hammer, (1998)Biotechnology and Bioengineering 58:193-195; Jensen and Hammer (1998)Appl. Environ. Microbiol. 64:82-85; and Khlebnikov et al. (2001)Microbiol. 147:3241. Other techniques used to replace regulatory regionsof chromosomal gene have been disclosed in Abdel-Hamid et al. (2001)Microbiol. 147:1483-1498 and Repoila and Gottesman (2001) J. Bacteriol.183:4012-4023.

With respect to optimizing metabolic pathway engineering in a selectedhost, the above-mentioned approaches have had limited success and eachapproach has certain disadvantages. Research has shown the expressionlevel of a genetically modified gene on a plasmid is not necessarilycorrelated with the level of expression of the same modified genelocated in the chromosome (See Khlebnikov et al. (2001) Microbiol.147:3241 and McCraken and Timms (1999) J. Bacteriol. 18:6569).

Moreover, the effect of increasing expression of one gene in a metabolicpathway may only have a marginal effect on the flux through thatmetabolic pathway. This may be true even if the gene being manipulatedcodes for an enzyme in a rate-limiting step because control of ametabolic pathway may be distributed over a number of enzymes.Therefore, while a gene has been engineered to achieve a high level ofexpression, for example a 10 to 100 fold increase in expression, theoverall performance of the engineered microorganism in a bioreactor maydecrease. The decrease could be due to the balance of other factorsinvolved in the metabolic pathway or the depletion of other substancesnecessary for optimum cell growth.

The above problem is addressed in part by Jensen and Hammer (WO98/07846). The disclosure of WO98/07846 describes the construction of aset of constitutive promoters that provide different levels of geneexpression. Specifically, artificial promoter libraries are constructedcomprising variants of a regulatory region that includes a −35 consensusbox, a −10 consensus box and a spacer (linker) region that lies betweenthese consensus regions. However, one of the drawbacks of the methoddescribed in WO 98/07846 is the extensive screening (in terms of timeand numbers of steps), which would be required to create a library ofclones with different levels of gene expression. It is also disclosed inthe reference that the modulation of promoter strength, by a fewbase-pair changes in the consensus sequences or by changes in the linkersequence, would result in a large impact in promoter strength, andtherefore, it would not be feasible to achieve small steps on promoterstrength modulation.

Therefore, a need still exists in the area of metabolic pathwayengineering to develop a quick and efficient means of determining theoptimum expression of a gene of interest in a metabolic pathway which inturn results in an optimization of strain performance for a desiredproduct. The present method satisfies this need by providing a method tocharacterize small changes in gene expression level and hence allowingfor the selection of a cell providing an optimum level of expression.

SUMMARY OF THE INVENTION

In one aspect the invention relates to a method of creating a library ofartificial promoters comprising a) obtaining an insertion DNA cassette,which comprises, a first recombinase site, a second recombinase site anda selective marker gene located between the first and the secondrecombinase sites; b) obtaining a first oligonucleotide which comprises,i) a first nucleic acid fragment homologous to an upstream region of achromosomal gene of interest, and ii) a second nucleic acid fragmenthomologous to a 5′ end of the insertion DNA cassette; c) obtaining asecond oligonucleotide which comprises, i) a third nucleic acid fragmenthomologous to a 3′ end of said insertion DNA cassette, ii) a precursorpromoter comprising a −35 consensus region (−35 to −30), a linkersequence and a −10 consensus region (−12 to −7), wherein the linkersequence comprises between 14-20 nucleotides and is flanked by the −35region and the −10 region, wherein said precursor promoter has beenmodified to include at least one modified nucleotide position of theprecursor promoter and wherein the −35 region and the −10 region eachinclude between 4 to 6 conserved nucleotides of the promoter, and iii) afourth nucleic acid fragment homologous to a downstream region of thetranscription start site of the promoter; and d) mixing the firstoligonucleotide and the second oligonucleotide in an amplificationreaction with the insertion DNA cassette to obtain a library of doublestranded amplified products comprising artificial promoters. In oneembodiment, the method further comprises purifying the amplifiedproducts. In another embodiment, the amplification step is by PCR. Inanother embodiment, the precursor promoter is selected from the groupconsisting of P_(trc) (SEQ ID NO 2); P_(D/E20) ((SEQ ID NO. 4); P_(H207)(SEQ ID NO. 3); P_(N25) (SEQ ID NO. 5); P_(G25) (SEQ ID NO.6); P_(J5)(SEQ ID NO.7); P_(A1) (SEQ ID NO. 8); P_(A2) (SEQ ID NO. 9); P_(A1) (SEQID NO. 10); P_(lac) (SEQ ID NO. 1); P_(lacUV5) (SEQ ID NO. 12); P_(CON)(SEQ ID NO.4); P_(GI) (SEQ ID NO. 15) and P_(bls)(SEQ ID NO. 14). In afurther embodiment the artificial promoter library includes thepromoters designated by SEQ ID NO. 15, SEQ ID NO. 16 and SEQ ID NO. 17.In a further embodiment the invention includes the artificial promoterlibrary produced according to the above method.

In a second aspect, the invention relates to a method of creating alibrary of ribosome is binding sites (RBS) comprising a) obtaining aninsertion DNA cassette, which comprises, a first recombinase site, asecond recombinase site and a selective marker gene located between thefirst and the second recombinase sites; b) obtaining a firstoligonucleotide which comprises, i) a first nucleic acid fragmenthomologous to an upstream region of a chromosomal gene of interest, andii) a second nucleic acid fragment homologous to a 5′ end of theinsertion DNA cassette; c) obtaining a second oligonucleotide whichcomprises, i) a third nucleic acid fragment homologous to a 3′ end ofsaid insertion DNA cassette, ii) a precursor promoter comprising a −35consensus region (−35 to −30), a linker sequence and a −10 consensusregion (−12 to −7), wherein the linker sequence comprises between 14-20nucleotides and is flanked by the −35 region and the −10 region, whereinsaid precursor promoter has been modified to include at least onemodified nucleotide position of the precursor promoter and wherein the−35 region and the −10 region each include between 4 to 6 conservednucleotides of the promoter, and iii) a fourth nucleic acid fragmenthomologous to a downstream region of the transcription start site of thepromoter; and d) mixing the first oligonucleotide and the secondoligonucleotide in an amplification reaction with the insertion DNAcassette to obtain a library of double stranded amplified productscomprising artificial promoters and e) obtaining a third oligonucleotidewhich comprises, i) a fifth nucleic acid fragment homologous to the 5′end of said chromosomal gene of interest, ii) a modified ribosomebinding site of the gene of interest, said ribosome binding siteincluding at least one modified nucleotide, and iii) a sixth nucleicacid fragment homologous to a downstream region of the −10 region of thesecond oligonucleotide; and e) mixing the PCR products of step d) withthe third oligonucleotide of step e) and the first oligonucleotide ogstep b) in a PCR reaction to obtain PCR products comprising artificialpromoters with modified ribosome binding sites. In an embodiment theribosome binding site is selected from the group consisting of AGGAAA,(SEQ ID NO. 30), AGAAAA (SEQ ID NO. 31), AGAAGA (SEQ ID NO. 32), AGGAGA(SEQ ID NO. 33), AAGAAGGAAA (SEQ ID NO. 34), AAGGAAAA (SEQ ID NO. 35),AAGGAAAG (SEQ ID NO. 36), AAGGAAAU (SEQ ID NO. 37), AAGGAAAAA (SEQ IDNO. 38), AAGGAAAAG (SEQ ID NO. 39), AAGGAAAAU (SEQ ID NO. 40), AAGGAAAAA(SEQ ID NO. 41), AAGGAAAAAG (SEQ ID NO. 42), AAGGAAAAAU (SEQ ID NO. 43),AAGGAAAAAAA (SEQ ID NO. 44), AAGGAAAAAAG (SEQ ID NO. 45), AAGGAAAAAAU(SEQ ID NO. 46), AAGGAAAAAAAA (SEQ ID NO. 47), AAGGAAAAAAG (SEQ ID NO.48), AAGGAAAAAAAU (SEQ ID NO. 49), AAGGAAAAAAAA (SEQ ID NO. 50),AAGGAAAAAAAAG (SEQ ID NO. 51), AAGGAAAAAAAAAU (SEQ ID NO. 52),AAGGAAAAAAAAAA (SEQ ID NO. 53), AAGGAAAAAAAAAG (SEQ ID NO. 54),AAGGAGGAAA (SEQ ID NO. 55), and AAGGAAAAAAAAAU (SEQ ID NO. 56). In afurther embodiment the invention includes the artificial promoterlibrary produced according to the above method.

In a third aspect, the invention relates to an artificial promoterlibrary comprising a mixture of double stranded polynucleotides whichinclude in sequential order: a) a nucleic acid fragment homologous to anupstream region of a chromosomal gene of interest, b) a firstrecombinase site, c) a nucleic acid sequence encoding an antimicrobialresistance gene, d) a second recombinase site, e) two consensus regionsof a promoter and a linker sequence, wherein the first consensus isregion comprises a −35 region, the second consensus region comprises a−10 region and the linker sequence comprises at least 14-20 nucleotidesand is flanked by the first consensus region and wherein the secondconsensus region and the −35 region and the −10 region each includebetween 4-6 conserved nucleotides of corresponding consensus regions ofthe promoter, and f) a nucleic acid fragment homologous to thedownstream region of the +1 transcription start site of the promoter. Inone embodiment the promoter library of the double strandedpolynucleotides will also include a modified start codon, wherein themodified start codon sequence is located between the −10 region and thenucleic acid sequence homologous to the downstream region of the +1transcription start site. In another embodiment the promoter library ofdouble stranded polynucleotides further include a stabilizing mRNAnucleic acid sequence, wherein the stabilizing mRNA sequence is locatedbetween the −10 region and the nucleic acid sequence homologous to thedownstream region of the +1 transcription start site.

In a fourth aspect, the invention relates to a method of modifying apromoter in selected host cells comprising obtaining a library of PCRproducts comprising artificial promoters, RBS, start codons orstablizing mRNA sequences or combinations thereof according to theinvention; b) transforming bacterial host cells with the PCR library,wherein the PCR products comprising the artificial promoters areintegrated into the bacterial host cells by homologous recombination; c)growing the transformed bacteria cells; d) selecting the transformedbacterial cells comprising the artificial promoters. In certainembodiments the bacterial host cell is selected from the groupconsisting of E. coli, Pantoea sp. and Bacillus sp.

In a fifth aspect, the invention relates to a method of creating alibrary of bacterial cells having a range of expression levels of achromosomal gene of interest comprising, a) obtaining a library of PCRproducts comprising artificial promoters according to the invention; b)transforming bacterial host cells with the PCR products, wherein the PCRproducts comprising the artificial promoters are integrated intobacterial host cells by homologous recombination to produce transformedbacterial cells; c) growing the transformed bacteria cells; and d)obtaining a library of transformed bacterial cells wherein the libraryexhibits a range of expression levels of a chromosomal gene of interest.In one embodiment the method further comprises selecting transformedbacterial cells from the library. In a second embodiment the selectedtransformed cells will have a low level of expression of the gene ofinterest, and in another embodiment the selected transformed bacterialcells have a high level of expression of the gene of interest.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic representation of a method of creating anartificial promoter library and the double stranded PCR productsobtained according to the method of the invention. Two oligonucleotideswhich are represented by numbers (1) and (2) and an insertion DNAcassette on a plasmid (3) are mixed together in a PCR reaction to form amixture of double stranded PCR products. Oligonucleotide (1) includesnucleic acid sequences homologous to an upstream region of a chromosomalgene of interest (H1) and a primer site (PS1). The PS1 is homologous tothe first end (5′) of an insertion DNA cassette (3). Oligonucleotide (2)is degenerated and includes a primer site (PS2) and artificial promotersequences (H2). The PS2 is homologous to the second end (3′) of theinsertion DNA cassette (3). The artificial promoter sequences (H2)comprise different modified-35 consensus regions, different modifiedlinker regions, and different modified-10 consensus regions orcombinations thereof. The insertion DNA construct (3) includes aselective marker, which is preferably an antibiotic resistant gene,flanked by two recombinase sites (FRT).

FIG. 2 is a schematic representation of the method of creating a DNAlibrary comprising artificial promoters, modified ribosome bindingsites, mRNA stabilizing sequences, and/or modified start codonsaccording to the invention. In this figure, the mixture of doublestranded PCR products of FIG. 1 are mixed in a further PCR reaction withthe oligonucleotide (1) and a third oligonucleotide (4) comprising anucleic acid fragment homologous to the 5′ end of the gene of interest(which is the same gene of interest in FIG. 1) a start codon, which maybe a modified start codon; a modified ribosome binding site of theprecursor promoter; a stabilizing mRNA segment and a nucleic acidfragment homologous to a downstream region of the start codon of thegene of interest to obtain a new mixture of double stranded PCRproducts. X indicates that the start codon may be modified.

FIG. 3 is a schematic representation of the replacement of a chromosomalregulatory sequence with the PCR products according to the invention.

FIG. 4 illustrates the sequences of various well-characterized promotersand includes approximately 50 base pair (bp) upstream of thetranscription start site (+1), including the −35 consensus boxes, thelinker sequences and the −10 consensus boxes. The promoters are alignedwith respect to the first T of the −35 consensus box and the last T ofthe −10 consensus box. The conserved regions are indicated in bold.P_(D/E20) is represented by SEQ ID NO. 3; P_(H207) is represented by SEQID NO. 4; P_(N25) is represented by SEQ ID NO. 5; P_(G25) is representedby SEQ ID NO. 6; P_(J5) is represented by SEQ ID NO. 7; P_(A1) isrepresented by SEQ ID NO. 8; P_(A2) is represented by SEQ ID NO. 9;P_(A3) is represented by SEQ ID NO. 10; P_(L) is represented by SEQ IDNO. 11; P_(lac) is represented by SEQ ID NO. 1; P_(lacUV5) isrepresented by SEQ ID NO. 12; P_(tacl) is represented by SEQ ID NO. 2;P_(con) is represented by SEQ ID NO. 13; and P_(bla) is represented bySEQ ID NO. 14.

FIG. 5 compares the chromosomal organization of the lactose operon ofthe wild-type strain (A) and chromosomal organization of a host straintransformed with a promoter (B) according to the invention.

FIG. 6 illustrates a library of promoters comprising three artificialpromoters used to replace the lactose operon promoter Plac (SEQ ID NO.18) and the lacl regulator. The library of promoters comprises threeartificial glucose isomerase promoters: 1.6 GI lacZ (SEQ ID NO. 19)which includes the 1.6GI promoter (SEQ ID NO. 15); 1.5 GI lacZ (SEQ IDNO. 20) which includes the 1.5GI promoter (SEQ ID NO. 16); and 1.2 GIlacZ (SEQ ID NO. 21) which includes the 1.2GI promoter (SEQ ID NO.0.17).

FIG. 7 illustrates the expression of the lacZ gene measured as specificactivity of β-galactosidase in a library of E. coli cells transformedwith the library comprising 1.6 GI lacZ (SEQ ID NO. 19), 1.5 GI lacZ(SEQ ID NO. 20) and 1.2 GI lacZ (SEQ ID NO. 21).

FIG. 8 illustrates the expression of the lacZ gene with the 1.6GIpromoter (SEQ ID NO. 19), wherein the ribosome binding site has beenaltered. Transformants are designated A = CAAGGAGGAA ACAGCTATG, (SEQ IDNO. 22) B = CAAGAAGGAA ACAGCTATG, (SEQ ID NO. 23) C = CACACAGGAAACAGCTATG, (SEQ ID NO. 24) D = CTCACAGGAG ACAGCTATG, (SEQ ID NO. 25) E =CTCACAGGAA ACAGCTATG, (SEQ ID NO. 26) F = CACACAGAAA ACAGCTATG, (SEQ IDNO. 27) G = CTCACAGAGA ACAGCTATG, (SEQ ID NO. 28) and H = CTCACAGAAAACAGCTATG. (SEQ ID NO. 29)

FIG. 9 illustrates the expression of the lacZ gene with the 1.6GIpromoter (SEQ ID NO. 19), wherein the ribosome binding site (AGGAAA) hasbeen altered and a stabilizing mRNA sequence has been inserted.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to a method of creating a library ofbacterial clones from amplified DNA libraries, particularly PCRgenerated DNA libraries, wherein the bacterial clones express achromosomal gene of interest at different levels. The generated DNAlibraries include any one of the following libraries, artificialpromoters, ribosome binding sites (RBS), start codons and mRNAstabilizing sequences. An advantage of the method disclosed herein isthat only one in vivo step is required to create the library ofbacterial clones.

One aspect of the present invention relates to the discovery, that geneexpression level is changed by altering one or two nucleotides in the−35 consensus region (−35 box), the −10 consensus region (−10 box), thelinker region, the RBS, and/or the start codon and further that thealteration allows a quick identification of a range of gene expressionthat would produce a significant phenotypic change. A second aspect, theinvention relates to the use of precursor promoter sequences, RBSs,start codons and/or mRNA stabilizing sequences which are containedwithin one or two degenerated oligonucleotides so that the DNA librarymay be generated by one or two amplification steps.

Definitions

Within this application, unless otherwise stated, illustration of thetechniques used may be found in any of several well-known referencessuch as: Sambrook, J., et al., MOLECULAR CLONING: A LABORATORY MANUAL,Cold Spring Harbor Laboratory Press (1989); Goeddel, D., ed., GENEEXPRESSION TECHNOLOGY, METHODS IN ENZYMOLOGY, 185, Academic Press, SanDiego, Calif. (1991); “GUIDE TO PROTEIN PURIFICATION” in Deutshcer, M.P., ed., Methods in Enzymology, Academic Press, San Diego, Calif.(1989); and, Innis, et. al., PCR PROTOCOLS: A GUIDE TO METHODS ANDAPPLICATIONS, Academic Press, San Diego, Calif. (1990). Unless definedotherwise, all technical and scientific terms used herein have the samemeaning as commonly understood by one or ordinary skill in the art towhich this invention pertains. Both Singleton et al., DICTIONARY OFMICROBIOLOGY AND MOLECULAR BIOLOGY, 2D. Ed., John Wiley and Sons, NewYork (1994) and Hale and Martin, THE HARPER COLLINS DICTIONARY OFBIOLOGY, Harper Perennial, New York (1991) provide one of skill in theart with general dictionaries of many of the terms used in thisinvention.

Although any methods and materials similar or equivalent to thosedescribed herein can be used in the practice or testing of the presentinvention, the preferred methods and materials are described. Numericranges are inclusive of the numbers defining the range.

Unless otherwise indicated, nucleic acids are written left to right in5′ to 3′ orientation; amino acid sequences are written left to right inamino to carboxy orientation, respectively. The headings provided hereinare not limitations of the various aspects or embodiments of theinvention which can be had by reference to the specification as a whole.Accordingly, the terms defined immediately below are more fully definedby reference to the specification as a whole. The references, issuedpatents and pending patent applications cited herein are incorporated byreference into this application.

For the purpose of this invention “a DNA library” includes any one or acombination of the following, artificial promoter libraries, modifiedribosome binding site (RBS) libraries, modified start codon libraries,and stabilizing mRNA libraries. While a library may include 103 or moremembers, in preferred embodiments a library will include at least 2, atleast 3, at least 4, at least 6, at least 8, at least 16 or at least 64members. A DNA library also referes to double stranded DNA molecules.

For the purposes of this application, a “promoter” or “promoter region”is a nucleic acid sequence that is recognized and bound by a DNAdependent RNA polymerase during initiation of transcription. Thepromoter, together with other transcriptional and translationalregulatory nucleic acid sequences (also termed “control sequences”) isnecessary to express a given gene or group of genes (an operon). Ingeneral, the transcriptional and translational regulatory sequencesinclude, but are not limited to, promoter sequences, ribosomal bindingsites, transcriptional start and stop sequences, translational start andstop sequences, and enhancer or activator sequences. The “transcriptionstart site” means the first nucleotide to be transcribed and isdesignated +1. Nucleotides downstream of the start site are numbered +2,+3, +4 etc., and nucleotides in the opposite (upstream) direction arenumbered −1, −2, −3 etc. A promoter may be a regulatable promoter, suchas Ptrc, which is induced by IPTG or a constitutive promoter.

In the context of the present invention, a promoter includes twoconsensus regions. A consensus region is a distinct group of conservedshort sequences recognized by RNA polymerases differing in their sigmafactors. One consensus region is centered about 10 base pairs (bp)upstream from the start site of transcription initiation and is referredto as the −10 consensus region (−10 box or Pribnow box). The otherconsensus region is centered about 35 bp upstream of the transcriptionalstart site and is referred to as the −35 consensus region (−35 box). Alinker sequence extends between each consensus region and is comprisedof about 14 to 20 bp.

A precursor promoter according to the invention may be a native(endogenous) promoter or an exogenous promoter. Further a precursorpromoter may be a genetically engineered promoter that is eitherheterologous or homologous to a gene of interest. Generally precursorpromoters will be in the range of 250 to 25 base pairs (bp); 150 to 25bp; 100 to 25 bp; 75 to 25 bp and preferably 50 to 30 bp from thetranscription start site (+1).

An “artificial promoter” according to the invention is a precursorpromoter that has been modified by altering a nucleotide in at least oneposition corresponding to a position in the −35 box, the −10 box and/orthe linker sequence. In a preferred embodiment, an artificial promoterwill comprise 30 to 50 bp upstream of the transcription start site (+1)and will be derived from a precursor promoter having 50 to 30 bp.

A “library of promoters” refers to a population of promoters whichincludes artificial promoters, having at least two members. In oneembodiment a library will be derived from the same precursor promoter.

A “ribosome binding site” (RBS) is a short nucleotide sequence usuallycomprising about 4-16 base pairs and functions by positioning the RBS onthe mRNA molecule for translation of an encoded protein. A “modifiedribosome binding” site is a ribosome binding site wherein one or morebase pairs have been altered. A preferred modified RBS is derived fromthe same regulatory region as a precursor promoter when both theprecursor promoter and RBS are modified and used in the same library. Alibrary of modified ribosome binding sites includes at least twomodified ribosome binding sites derived from the same precursor.

A “stabilizing mRNA” is a nucleic acid sequence insert used to influencegene expression. These inserts are generally located between thetranscription and translational start sites of a gene or nucleic acidsequence.

A “library of bacterial clones” refers to a population of bacterialcells grown under essentially the same growth conditions and which areidentical in most of their genome but include a DNA library as definedherein which may comprise for example a library of artificial promoters.A library of bacterial clones will have different levels of expressionof the same gene of interest.

As used herein, the term “nucleic acid” includes RNA, DNA and cDNAmolecules. It will be understood that, as a result of the degeneracy ofthe genetic code, a multitude of nucleotide sequences encoding a givenprotein may be produced. The term nucleic acid is used interchangeablywith the term “polynucleotide”. An “oligonucleotide” is a short chainnucleic acid molecule. A primer is an oligonucleotide, whether occurringnaturally as in a purified restriction digest or produced synthetically,which is capable of acting as a point of initiation of synthesis whenplaced under conditions in which synthesis of a primer extension productwhich is complementary to a nucleic acid strand is induced, (i.e., inthe presence of nucleotides and an inducing agent such as DNA polymeraseand at a suitable temperature and pH). The primer is preferably singlestranded for maximum efficiency in amplification, but may alternativelybe double stranded. If double stranded, the primer is first treated toseparate its strands before being used to prepare extension products.Preferably, the primer is an oligodeoxyribonucleotide. The primer mustbe sufficiently long to prime the synthesis of extension products in thepresence of the inducing agent. The exact lengths of the primers willdepend on many factors, including temperature, source of primer and theuse of the method.

As used herein, the term “gene” means the segment of DNA involved inproducing a polypeptide chain, that may or may not include regionspreceding and following the coding region (e.g. 5′ untranslated (5′ UTR)or “leader” sequences and 3′ UTR or “trailer” sequences), as well asintervening sequences (introns) between individual coding segments(exons).

As used herein the term “polypeptide” refers to a compound made up ofamino acid residues linked by peptide bonds. The terms protein, peptideand polypeptide are used interchangeably herein.

The term “modification” includes a deletion, insertion, substitution orinterruption of at least one nucleotide or amino acid in a sequence.

As used herein, a “deletion” is defined as a change in either anucleotide or amino acid sequence in which one or more nucleotides oramino acid residues, respectively, are absent.

As used herein, an “insertion” or “addition” is that change in anucleotide or amino acid sequence which has resulted in the addition ofone or more nucleotides or amino acid residues, respectively, ascompared to a parent sequence.

As used herein, a “substitution” results from the replacement of one ormore nucleotides or amino acids by different nucleotides or amino acids,respectively.

In one embodiment a modified DNA sequence is generated with sitesaturation mutagenesis in at least one nucleotide. In anotherembodiment, site saturation mutagenesis is performed for two or morenucleotides. In a further embodiment, a modified or mutant DNA sequencehas more than 40%, more than 45%, more than 50%, more than 55%, morethan 60%, more than 65%, more than 70%, more than 75%, more than 80%,more than 85%, more than 90%, more than 95%, more than 96%, more than97%, or more than 98% homology with a wild-type sequence from which itwas modified from. In alternative embodiments, mutant DNA is generatedin vivo using any known mutagenic procedure such as, for example,radiation, nitrosoguanidine and the like.

A nucleic acid is “operably linked” when it is placed into a functionalrelationship with another nucleic acid sequence. For example, a promoteris operably linked to a coding sequence if it affects the transcriptionof the sequence; or a ribosome binding site is operably linked to acoding sequence if it is positioned so as to facilitate translation.Linking of nucleic acid sequences may be accomplished by ligation atconvenient restriction sites. If such sites do not exist, syntheticoligonucleotide adaptors or linkers may be used in accordance withconventional practice.

As used herein a “DNA construct” refers to a nucleic acid sequence orfragment that is used to introduce sequences into a host cell ororganism. The DNA may be generated in vitro by PCR or any other suitabletechniques. In some embodiments a DNA construct according to theinvention comprises homologous upstream (5′) and/or homologousdownstream (3′) sequences to a precursor promoter, a gene of interest orto another DNA segment. In yet another embodiment a DNA construct may beinserted into a vector. The DNA constructs may include homologous orheterologous sequences to a host cell gene and further may include acombination of heterologous sequences and homologous sequences. In someembodiments, a DNA construct will include a selective marker gene. Inother embodiments, a DNA construct will include an artificial promoterand in other embodiments a DNA construct will include a modified RBSsequence, a modified translational start codon and stabilizing mRNAsequences. These DNA constructs are sometimes referred to hereincollectively or individually as “regulatory DNA constructs”.

As used herein, the term “vector” refers to a nucleic acid constructdesigned for transfer between different host cells. A vector may be aplasmid, a bacteriophage, a cloning vector, a shuttle vector or anexpression vector. An “expression vector” refers to a vector that hasthe ability to incorporate and express heterologous DNA fragments in aforeign cell. Many prokaryotic and eukaryotic expression vectors arecommercially available. Selection of appropriate expression vectors iswithin the knowledge of those having skill in the art. Vectors used inthe process of the may be any vector suitable for isolation andcharacterization of a promoter.

As used herein, a “flanking sequence” refers to any sequence that iseither upstream or downstream of the sequence being discussed (e.g., forgenes A B C, gene B is flanked by the A and C gene sequences). In someembodiments, a flanking sequence is present on only a single side(either 3′ or 5′) of a DNA fragment, but in preferred embodiments, it ison each side of the sequence being flanked.

As used herein the terms, “heterologous nucleic acid sequence” orheterologous DNA construct” refers to a portion of a genetic sequencethat is not native to the cell in which it is expressed. “Heterologous,”with respect to a control sequence refers to a control sequence (i.e.,promoter) that does not function in nature to regulate the same gene theexpression of which it is currently regulating. Generally, heterologousnucleic acid sequences are not endogenous to the cell or part of thegenome in which they are present, and have been added to the cell, byinfection, transfection, microinjection, electroporation, or the like.In some embodiments, “heterologous nucleic acid constructs” contain acontrol sequence/DNA coding sequence combination that is the same as, ordifferent from a control sequence/DNA coding sequence combination foundin the native cell.

As used herein, “homology” refers to sequence similarity or identity,with identity being preferred. This homology is determined usingstandard techniques known in the art (See e.g., Smith and Waterman, Adv.Appl. Math., 2:482 (1981); Needleman and Wunsch, J. Mol. Biol., 48:443(1970); Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444 (1988);programs such as GAP, BESTFIT, FASTA, and TFASTA in the WisconsinGenetics Software Package (Genetics Computer Group, Madison, Wis.); andDevereux et al., Nucl. Acid Res., 12:387-395 (1984)).

The term “target site” is intended to mean a predetermined genomiclocation within a bacterial chromosome where integration of a DNAconstruct or a DNA library is to occur.

As used herein, the term “chromosomal integration” refers to the processwhereby an exogenous nucleic acid sequence is introduced into thechromosome of a host cell (e.g., Bacillus). The homologous sequences ofthe exogenous nucleic acid sequence align with homologous regions of thechromosome. Subsequently, the sequence between the homologous regions ofthe chromosomal sequence is replaced by the incoming exogenous sequencein a double crossover (i.e., homologous recombination).

As used herein, the term “introduced” used in the context of inserting anucleic acid sequence into a cell, means “transfection,”“transformation,” or “transduction,” and includes reference to theincorporation of a nucleic acid sequence into a eukaryotic orprokaryotic cell where the nucleic acid sequence may be incorporatedinto the genome of the cell (e.g., chromosome, plasmid, plastid, ormitochondrial DNA), converted into an autonomous replicon, ortransiently expressed (for example, transfected mRNA).

As used herein, the terms “transformed,” “stably transformed,” and“transgenic” used in reference to a cell means the cell has a non-native(heterologous) nucleic acid sequence integrated into its genome or as anepisomal plasmid that is maintained through two or more generations.

As used herein “an insertion DNA construct” or “insertion DNA cassette”is a DNA construct that includes a selectable marker gene which isflanked on both sides by a recombinase recognition site. A “recombinaserecognition site” is a novel recombination site that facilitatesdirectional insertion of nucleotide sequences into correspondingrecombination sites at a predetermined genomic location (a target site)within the bacterial chromosome where the integration of a DNA fragmentis to occur.

As used herein, the term “selectable marker” refers to a gene capable ofexpression in host cell which allows for ease of selection of thosehosts containing an introduced nucleic acid or vector. Examples of suchselectable markers include but are not limited to antimicrobials, (e.g.,kanamycin, erythromycin, actinomycin, chloramphenicol and tetracycline).Thus, the term “selectable marker” refers to genes that provide anindication that a host cell has taken up an exogenous polynucleotidesequence or some other reaction has occurred. Typically, selectablemarkers are genes that confer antimicrobial resistance or a metabolicadvantage on the host cell to allow cells containing the exogenous DNAto be distinguished from cells that have not received any exogenoussequence during the transformation.

As used herein, the terms “amplification” and “gene amplification” referto a process by which specific DNA sequences are disproportionatelyreplicated such that the amplified nucleic acid sequence becomes presentin a higher copy number than was initially present in the genome. Theterm also refers to the introduction into a single cell of anamplifiable marker in conjunction with other gene sequences (i.e.,comprising one or more non-selectable genes such as those containedwithin an expression vector) and the application of appropriateselective pressure such that the cell amplifies both the amplifiablemarker and the other, non-selectable gene sequences. The amplifiablemarker may be physically linked to the other gene sequences oralternatively two separate pieces of DNA, one containing the amplifiablemarker and the other containing the non-selectable marker, may beintroduced into the same cell.

As used herein, the term “polymerase chain reaction” (“PCR”) refers tothe methods of U.S. Pat. Nos. 4,683,195; 4,683,202, and 4,965,188,hereby incorporated by reference, which include methods for increasingthe concentration of a segment of a polynucleotide or target sequence ina mixture of genomic DNA without cloning or purification. This processfor amplifying the target sequence consists of introducing a largeexcess of two oligonucleotide primers to the DNA mixture containing thedesired target sequence, followed by a precise sequence of thermalcycling in the presence of a DNA polymerase. The two primers arecomplementary to their respective strands of the double stranded targetsequence. To effect amplification, the mixture is denatured and theprimers then annealed to their complementary sequences within the targetmolecule. Following annealing, the primers are extended with apolymerase so as to form a new pair of complementary strands. The stepsof denaturation, primer annealing and polymerase extension can berepeated many times (i.e., denaturation, annealing and extensionconstitute one “cycle”; there can be numerous “cycles”) to obtain a highconcentration of an amplified segment of the desired target sequence.The length of the amplified segment of the desired target sequence isdetermined by the relative positions of the primers with respect to eachother, and therefore, this length is a controllable parameter. Becausethe desired amplified segments of the target sequence become thepredominant sequences (in terms of concentration) in the mixture, theyare said to be “PCR amplified”.

As used herein, the term “PCR product,” refers to the resultant mixtureof compounds after two or more cycles of the PCR steps of denaturation,annealing and extension are complete. These terms encompass the casewhere there has been amplification of one or more segments of one ormore target sequences. The term double stranded amplified productsincludes PCR products.

As used herein, the term “restriction enzymes” refers to bacterialenzymes, each of which cut double-stranded DNA at or near a specificnucleotide sequence.

With PCR, it is possible to amplify a single copy of a specific targetsequence in genomic DNA to a level detectable by several differentmethodologies (e.g., hybridization with a labeled probe; incorporationof biotinylated primers followed by avidin-enzyme conjugate detection;incorporation of ³²P-labeled deoxynucleotide triphosphates, such as dCTPor dATP, into the amplified segment). In addition to genomic DNA, anyoligonucleotide or polynucleotide sequence can be amplified with theappropriate set of primer molecules. In particular, the amplifiedsegments created by the PCR process itself are, themselves, efficienttemplates for subsequent PCR amplifications.

As used herein, “host cell” refers to a cell that has the capacity toact as a host and expression vehicle for an introduced DNA (exogenous)sequence according to the invention.

As used herein the term “expression” refers to the process by which apolypeptide is produced based on the nucleic acid sequence of a gene.The process includes both transcription and translation.

A “range of expression levels” means the expression of a gene ofinterest obtained from a library of bacterial clones transformed withPCR generated DNA libraries. In one embodiment, the level of expressionin a clone library will range from 1 to 500%, compared to the expressionof a control which includes a precursor or native promoter andregulatory region when grown under is essentially the same conditions.

“Optimal expression” refers to the cumulative conditions that provide anoptimal level of gene expression for a particular coding region. Undercertain laboratory conditions, optimal expression means a lower level ofgene expression and under other conditions, optimal expression means ahigher level of gene expression that can coexist in a cell in situationswhere, under certain conditions the expressed gene or product producedtherefrom would be detrimental to the viability of the cells or have anadverse effect upon the cells.

“Isolated” as used herein refers to a nucleic acid or polypeptide thatis removed from at least one component with which it is naturallyassociated.

The term “comprises and its cognates are used in their inclusive sense:that is equivalent to the term including and its cognates.

“A”, “an” and “the” include plural references unless the context clearlydictates otherwise.

PREFERRED EMBODIMENTS OF THE INVENTION

Promoter sequences useful for creating artificial promoters according tothe invention include the precursor promoters listed in Table 1 below.FIG. 4 illustrates the sequence of some of these precursor promotersincluding the −35 box, −10 box and linker region. All promoters in thetable are characterized with respect to the beta-lactamase promoter Pblaand promoter strengths are given in “Pbla-units”. (Deuschle et al., EMBOJournal 5(11):2987-2994 (1986)).

In general, promoters useful in the invention include promoter sequencesof between 200 to 20 base pairs (bp), preferably 150 to 25 bp, morepreferably between 100 to 30 bp and most preferably between 50 to 30 bpupstream from the transcription start site (+1). The shorter sequences(between 50 to 30 bp) are most preferred because DNA libraries may becreated more easily within a single degenerated oligonucleotide with theshorter sequences. Therefore in a preferred embodiment, a short sequenceof the promoters as disclosed in FIG. 4 would be used to obtainartificial promoters according to the invention. These preferredsequences would include about 50 to 30 bp staring at about thetranscriptional start site (+1) of said promoters. TABLE 1 RelativePROMOTER Source Activity SEQ ID NO. β-lactamase (bla) E.coli vector 1 14PConsensus Synthetic DNA 4 13 (con) PTac I (Trc) Hybrid of 2 17 2promoters PLacUV5 Mutant of Lac 3.3 12 Plac E.coli lacZ gene 5.7 1 PLPhage λ 37 11 PA1 Phage T7 22 8 PA2 Phage T7 20 9 PA3 Phage T7 76 10 PJ5Phage T5 9 7 PG25 Phage T5 19 6 PN25 Phage T5 30 5 PD/E20 Phage T5 56 4PH207 Phage T5 55 3

Additional promoters useful in the invention are disclosed in Sommer etal., (2000) Microbiol. 146:2643-2653, wherein the sequence of Ptac andvariants containing 1 or 2 base pair changes are taught. In oneembodiment a preferred precursor promoter is a trc promoter (Ptrc). The−35 box (TTGACA) and the −10 box (TATAAT) is the same as Ptac. However,the linker region of Ptrc 10 includes 17 bp as compared to 16 bp forPtac. There is an addition of a “C” between nucleotides −18 and −10 ofPtac. (Russell and Bennett, (1982) Gene: 20:231 and Amann et al., (1983)Gene 25:167-178).

A further useful promoter is the glucose isomerase promoter P_(GI). Thispromoter is also known in the literature as a xylose isomerase promoterand reference is made to Amone et al., is (1989) Appl. Microbiol.Biotechnol. 30:351-357. The P_(GI) comprises the following GCCCTTGACAATGCCACATCCTGAGCA AATAAT TCAACCACTA ATTGTGAGCGGATAACA (SEQ ID NO. 15),wherein the −35 box is represented by TTGACA, the −10 box is representedby AATAAT and the +1 transcription start site is A.

In addition to the above promoters, a variety of precursor promoters canbe utilized in the practice of the present invention. In some cases,strong promoters tend to be overexpressed to the detriment of the hostcell viability. Cells use a limited set of signals to engage thetranscriptional machinery and transcribe a gene. Bacteria such as E.coli, uses a core RNA polymerase and several sigma subunits to recognizedifferent type of promoters (deHaseth et al. 1998. J. Bact. 180:3019-3025. The E. coli genes required for fast growth are mainly underthe control of the sigma factor coded by the rpoD gene. The most obviouscomponents of a RpoD-dependent promoter are the −35 and −10 regions thatcontain variations of the consensus sequences TTGACA and TATAATrespectively. The promoter region contains 2 other components thataffect promoter strength in a subtler manner: the upstream (Gourse etal., 2000. Mol. Microbiol. 37: 687-695) and the spacer regions (Burr etal. (2000) NAR 28: 1864-1870). The contribution of each one of these 2elements varies depending on how similar the −35 and −10 region are tothe consensus.

A precursor promoter used to obtain a library of artificial promoters asdescribed herein may be determined by various exemplary methods. Whilenot wanting to be limited, in one embodiment, sequencing of a particularhost genome may be performed and putative promoter sequences identifiedusing computerized searching algorithms. For example, a region of agenome may be sequenced and analyzed for the presence of putativepromoters using Neural Network for Promoter Prediction software, NNPP.NNPP is a time-delay neural network consisting mostly of two featurelayers, one for recognizing TATA-boxes and one for recognizing so called“initiators”, which are regions spanning the transcription start site.Both feature layers are combined into one output unit. Furtheridentification of precursor promoter sequences can be identified byexamination of putative promoter sequences identified in a genome of ahost cell using homology analysis. For example, by using BLAST. Theseputative sequences may then be cloned into a cassette suitable forpreliminary characterization in E. coli and/or direct characterizationin E. coli.

In another embodiment, identification of consensus promoter sequencescan be identified by examination of the family of genomes and putativepromoter sequences identified in the genome in question using homologyanalysis. For example, a homology study of a family of genomes may beperformed and analyzed for the presence of putative consensus promotersusing BLAST. These putative promoter sequences may then be cloned into acassette suitable for preliminary characterization in E coli.

An artificial promoter according to the invention will comprise at leastone modification to a nucleotide in a precursor promoter. In oneembodiment the modification will be to a nucleotide positioned in the−35 consensus region. This modification may include a modification toone or more nucleotides at a position equivalent to a nucleotide at the−30, −31, −33, −34, −35, and/or −36 position of a precursor promoter.Preferably the modification will be of one or two nucleotides, andpreferably the modification will be a substitution of one nucleotide ortwo nucleotides. When two positions are to be modified, four positionswill be conserved, and when one position is modified, five positionswill be conserved. In another embodiment the modification will include amodification to the nucleotide represented by position −30 and/or achange to a position corresponding to −35.

In preferred embodiments, an artificial promoter is obtained from aprecursor promoter having a −35 box represented by the followingsequences, TTGACA, TTGCTA, TTGCTT, TTGATA, TTGACT, TTTACA and TTCAAA.Particularly preferred −35 consensus regions from precursor promotersare TTTACA and TTGACA. As a non-limiting example when TTGACA is the −35box of a precursor promoter, the nucleotide at position −30 is A and itmay be substituted with a T, G or C nucleotide, the nucleotide atposition −31 is C and it may be substituted with a A, T or G nucleotide;the nucleotide at position −32 is A and it may be substituted with a T,G or C nucleotide; the nucleotide at position −33 is G and it may besubstituted with a A, T, or C nucleotide; the nucleotide at position −34is T and it may be substituted with a A, G or C nucleotide; and thenucleotide at position −35 is T and it may be substituted with a A, G orC nucleotide.

In another embodiment, the modification will be in the −10 consensusregion. This modification may include a modification to one or morenucleotides at a position corresponding to the −7, −8, −9, −10, −11,and/or −12 position of a precursor promoter. Preferably the modificationwill be in one or two nucleotide positions. In a particularly preferredembodiment, the precursor promoter will include the following sequencesof the −10 box, TAAGAT, TATAAT, TATACT, GATACT, TACGAT, AATAAT, TATGTTand GACAAT. Particularly preferred are the sequences TATAAT, TATGTT,AATAAT and TAAGAT and most preferred are TATAAT and AATAAT. In oneparticular embodiment, the precursor promoter is the trc promoter andmost particularly the 50 to 30 bp sequence upstream of the +1transcription start site and the artificial promoter will include atleast one modification to a nucleotide in the −10 box represented byTAAGAT. For example, since the nucleotide at position −7 is T, it may besubstituted with a C, G or A nucleotide; since the nucleotide atposition −8 is A, it may be substituted with a C, G or T nucleotide;since the nucleotide at position −9 is G, it may be substituted with aC, T or A; since the nucleotide at position −10 is A, it may besubstituted with a T, C or G nucleotide; since the nucleotide atposition −11 is A, it may be substituted with a T, C or G nucleotide;and since the nucleotide at position −12 is T, it may be substitutedwith a C, G or T nucleotide.

In some embodiments of the invention, both the −35 box and the −10 boxof the precursor promoter will have modifications. In one embodiment,the modification will include one nucleotide in each consensus region,and in a further embodiment the modification will include twonucleotides in each consensus region. In another embodiment amodification will include a modification to the −35 box represented byTTGACA and a modification to the −10 box represented by AATAAT. Inanother embodiment the modification will include a modification to the−35 box represented by TTGACA and a modification to the −10 boxrepresented by TATAAT.

The linker sequence of a precursor promoter may also be modified toobtain an artificial promoter according to the invention. The precursorlinker sequence may include deletions, substitutions or insertions.Preferably the linker sequence is between 14 and 20 base pairs inlength. The length of the linker sequence may be modified to optimizeexpression by performing deletion analysis, such as by site directedmutagenesis to create sequential deletions in the precursor promoter.The linker sequence or the precursor promoter may be modified in lengthto include 16 base pairs, 17 base pairs, 18 base pairs, 19 base pairs or20 base pairs.

In one embodiment, modified DNA sequences in the precursor promoter aregenerated by using a degenerated oligonucleotide in accordance with wellknow techniques. In a preferred embodiment, the artificial promoterswill comprise 30 to 50 bp upstream of the transcription site (+1) sothat the promoter could be contained within an oligonucleotide and thelibrary of promoters created by degeneration of the oligonucleotide.

Promoter strength can be quantified using in vitro methods that measurethe kinetics of binding of the RNA polymerase to a particular piece ofDNA, and also allows the measurement of transcription initiation (HawleyD. K et al., Chapter 3: in: PROMOTERS: STRUCTURE AND FUNCTION. R. L/Rodriguez and M. J. Chamberlin eds. Praeger Scientific. New York). Invivo methods have been used also to quantify promoter strength. In thiscase, the approach has been to fuse the promoter to a reporter gene andthe efficiency of RNA synthesis measured.

To create DNA libraries which comprise a library of artificialpromoters, a first degenerated oligonucleotide comprising a nucleic acidsequence homologous to a first end, preferably the 3′ end, of aninsertion DNA construct, a promoter as described above, and a nucleicacid sequence homologous to the downstream region of the transcriptionstart site of a precursor or native promoter is mixed with both i) asecond oligonucleotide which comprises a nucleic acid sequencehomologous to an upstream region of the precursor or native promoter ofa chromosomal gene of interest and a nucleic acid sequence homologous toa second end, preferably the 5′ end, of the insertion DNA construct, andii) an insertion DNA construct in an amplification reaction, preferablya PCR reaction to obtain double stranded amplified products comprisingartificial promoters.

In a preferred embodiment, an insertion DNA construct is carried on aplasmid, preferably on a R6K plasmid and comprises an antibioticresistance gene flanked on both sides by a recombinase recognition site.(Datsenko and Warner (2000) Proc. Natl. Acad. Sc. 97:6640-6645). Whileany desired selective marker can be used, antibiotic resistant markers(AnbR) are most useful. These include but are not limited to, CmR, KmRand GmR. Preferably, the recombinase recognition sites are the same.Recombinase sites are well-known in the art and generally fall into twodistinct families based on their mechanism of catalysis and reference ismade to Huang et al., (1991) Nucleic Acids Res. 19:443 and Nunes-Duby etal., (1998) Nucleic Acid Res. 26:391-406.

A preferred recombination system Is the Saccharomyces Flp/FRTrecombination system, which comprises a Flp enzyme and two asymmetric 34bp FRT minimum recombination sites (Zhu et al., (1995) J. Biol. Chem.270:11646-11653). A FRT sites comprises two 13 bp sequences, invertedand imperfectly repeated, which surround an 8 bp core asymmetricsequence where crossing-over occurs. The FLP-dependent intramolecularrecombination between two parallel FRT sites results in excision of anyintervening DNA sequence as a circular molecule producing tworecombination products, each containing one FRT site (Huffman et al.(1999) J. Mol. Biol. 286: 1-13).

In general, nucleic acid sequences homologous to downstream regions orupstream regions may include from 2-150 bp, preferably 5-100 bp, morepreferably 5-50 bp and also 10-40 bp. In specific embodiments a nucleicsequence homologous to the downstream transcription start site of theprecursor or native promoter or a nucleic acid sequence homologous to anupstream region of the precursor promoter of a chromosomal gene ofinterest may include about 5 to 100 base pairs and also 5 to 50 basepairs. The nucleic acid homologous to a 5′ or 3′ end of the insertionDNA construct may include about 10 to 40 base pairs and preferably about2 to 25 base pairs. An upstream region of the precursor promoter means asegment upstream (5′) of the −35 consensus sequence.

In further embodiments of the invention a RBS, downstream of theprecursor promoter region, may be modified. Preferred RBSs, which may bemodified include the sequences selected from the following: AGGAAA, (SEQID NO. 30), AGAAAA (SEQ ID NO. 31), AGAAGA (SEQ ID NO. 32), AGGAGA (SEQID NO. 33), AAGAAGGAAA (SEQ ID NO. 34), AAGGAAAA (SEQ ID NO. 35),AAGGAAAG (SEQ ID NO. 36), AAGGAAAU (SEQ ID NO. 37), AAGGAAAAA (SEQ IDNO. 38), AAGGAAAAG (SEQ ID NO. 39), AAGGAAAAU (SEQ ID NO. 40),AAGGAAAAAA (SEQ ID NO. 41), AAGGAAAAAG (SEQ ID NO. 42), AAGGAAAAAU (SEQID NO. 43), AAGGAAAAAAA (SEQ ID NO. 44), AAGGAAAAAAAG (SEQ ID NO. 45),AAGGAAAAAAU (SEQ ID NO. 46), AAGGAAAAAAAA (SEQ ID NO. 47), AAGGAAAAAAAG(SEQ ID NO. 48), AAGGAAAAAAAU (SEQ ID NO. 49), AAGGAAAAAAAAA (SEQ ID NO.50), AAGGAAAAAAAAG (SEQ ID NO. 51), AAGGAAAAAAAAAU (SEQ ID NO. 52),AAGGAAAAAAAA (SEQ ID NO. 53), AAGGAAAAAAAAAG (SEQ ID NO. 54), AAGGAGGAAA(SEQ ID NO. 55), and AAGGAAAAAAAAAU (SEQ ID NO. 56). Most preferred RBSinclude AGGAAA, (SEQ ID NO. 30), AGAAAA (SEQ ID NO. 31), AGAAGA (SEQ IDNO. 32), AGGAGA (SEQ ID NO. 33), and AAGGAGGAAA (SEQ ID NO. 55). Themodified RBS may include substitution, deletion or insertion of anyoneof the base pairs comprising the RBS.

To obtain DNA libraries comprising modified RBS libraries, aoligonucleotide comprising a nucleic acid fragment homologous to adownstream region of the −10 box of a promoter or artificial promoter, amodified RBS, and a nucleic acid fragment homologous to the 5′ end ofthe chromosomal gene of interest which includes the start codon, ismixed with the double stranded amplified products comprising artificialpromoters as described above and under similar PCR reactions. Thehomologous nucleic acid fragments may comprise from 2 to 100 base pairsand preferably from 2 to 50 base pairs. In other embodiments the (XTG)start codon of the gene of interest may be modified. These modificationsmay include X=A, T, G, depending on the native start codon in the geneof interest.

In other embodiments of the method described herein a stabilizing mRNAsequence may be incorporated into an oligonucleotide. Theoligonucleotide may comprise an artificial promoter, a modified ribosomebinding or both. The stabilizing sequences are preferably insertedbetween the RBS and the transcription start site.

Stabilizing mRNA sequence are well known in the art and reference ismade to Carrier et al. (1999) Biotechnol. Prog. 15:58-64. Preferred mRNAstabilizing sequences include the sequencesGGTCGAGTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO. 63);GGTGGACTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO. 64);CCTCGAGTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO. 65);GCTCGAGTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO. 66);CGTCGAGTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO.67);GGTGGAGTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO. 68) andGCTGGACTTATCTCGAGTGAGATATTGTTGACG, (SEQ ID NO. 69). In a preferredembodiment the stabilizing sequence is SEQ ID NO. 67. The doublestranded amplified products may also include modified start codons of agene of interest.

The double stranded amplified products which comprise artificialpromoters, modified ribosome binding sites, modified start codons,stabilizing mRNA sequences and combinations thereof, according to theinvention may be used individually and introduced into a host cell.Additionally, the double stranded amplified products may be used in aDNA library wherein said library comprises one or more of a library ofartificial promoters, a library of modified ribosome binding sites, alibrary of modified start codons and which may or may not includestabilizing mRNA sequences. The DNA libraries are introduced intobacterial host cells wherein they replace the chromosomal regulatoryregions of a gene of interest. Preferably the double stranded amplifiedproducts are integrated into the host cell chromosome. Flankinghomologous regions of the double stranded amplified products replacehomologous regions at a target site in a gene sequence of interest in ahost chromosome. In a preferred embodiment, the integration of the PCRproducts is a stable and non-reverting integration. Preferablyreplacement is by a double crossover (i.e., homologous recombination).The introduced PCR products may create a library of bacterial cellshaving a range of expression levels for a gene of interest.

The method as disclosed herein is not limited to expression of anyparticular gene or group of genes (an operon), but is intended to bebroadly applicable to many different genes or operons. In one preferredembodiment, the artificial promoters or other regulatory DNA constructsaccording to the invention will be operably linked to a coding sequencethat was heterologous to a precursor promoter, and in another embodimentthe artificial promoters or other regulatory DNA constructs will beoperably linked to a coding sequence that was homologous to theprecursor promoter. Further the coding sequence may be heterologous orendogenous to the host cell transformed according to the invention.

In some embodiments, the gene encodes therapeutically significantproteins or peptides, such as growth factors, hormones, cytokines,ligands, receptors and inhibitors, as well as vaccines and antibodies. Agene may also encode commercially important proteins or peptides, suchas enzymes (e.g., proteases, amylases, glucoamylases, dehydrogenases,esterases, cellulases, galactosidases, oxidases, reductases, kinases,xylanases, laccases, phenol oxidases, chitinases, glucose oxidases,catalases, phytases, isomerases, phosphatases, and lipases). In furtherembodiments the gene of interest encodes global regulators; transporterproteins, such as glucose and/or DKG permeases, and enzymes from primaryand secondary metabolism, such as tpi and nuo which code for triosephosphate isomerase and NADH dehydrogenase, respectively.

In one embodiment, the host cell is a bacterial cell such as a grampositive bacteria. In another embodiment the host cell is agram-negative bacteria. In some preferred embodiments, the term refersto cells in the genus Pantoea, the genus Bacillus and E. coli cells.

As used herein, “the genus Bacillus” includes all members known to thoseof skill in the art, including but not limited to B. subtilis, B.licheniformis, B. lentus, B. brevis, B. stearothermophilus, B.alkalophilus, B. amyloliquefaciens, B. clausii, B. halodurans, B.megaterium, B. coagulans, B. circulans, B. lautus, and B. thuringiensis.It is recognized that the genus Bacillus continues to undergotaxonomical reorganization. Thus, it is intended that the genus includespecies that have been reclassified, including but not limited to suchorganisms as B. stearothermophilus, which is now named “Geobacillusstearothermophilus.” The production of resistant endospores in thepresence of oxygen is considered the defining feature of the genusBacillus, although this characteristic also applies to the recentlynamed Alicyclobacillus, Amphibacillus, Aneurinibacillus, Anoxybacillus,Brevibacillus, Filobacillus, Gracilibacillus, Halobacillus,Paenibacillus, Salibacillus, Thermobacillus, Ureibacillus, andVirgibacillus.

As used herein, “the genus Pantoea” includes all members known to thoseof skill in the art, including but not limited to P. agglomerans, P.dispersa, P. punctata, P. citrea, P. terrea, P. ananas and P. sterartii.It is recognized that the genus Pantoea continues to undergo taxonomicalreorganization. Thus, it is intended that the genus include species thathave been reclassified, including but not limited to such organisms asErwinia herbicola.

One skilled in the art are well aware of methods for introducingpolynucleotides into host cells and particularly into E. coli, Bacillusand Pantoea host cells. General transformation techniques are disclosedin CURRENT PROTOCOLS IN MOLECULAR BIOLOGY Vol. 1, eds. Ausubel et al.John Wiley & Sons Inc, (1987) Chap. 7. and Sambrook, J., et al.,MOLECULAR CLONING: A L ABORATORY MANUAL, Cold Spring Harbor LaboratoryPress (1989). Reference is also made to Ferrari et al., Genetics pgs57-72 in Hardwood et al. Ed. BACILLUS, Plenum Publishing Corp. 1989;Chang et al., (1979) Mol. Gen. Genet. 168:11-15; Smith et al., (1986)Appl. and Env. Microbiol. 51:634 and Potter, H. (1988) Anal Biochem174:361-373 wherein methods of transformation, includingelectroporation, protoplast transformation and congression; transductionand protoplast fusion are disclosed. Methods of transformations areparticularly preferred.

Methods suitable for the maintenance and growth of bacterial cells iswell known and reference is made to the Manual of Methods of GeneralBacteriology, Eds. P. Gerhardt et al., American Society forMicrobiology, Washington, D.C. (1981) and T. D. Brock in Biotechnology:A Textbook of Industrial Microbiology 2 ed. (1989) Sinauer Associates,Sunderland Mass.

The transformed host cells are selected based on the phenotype responseto a selectable marker which was provided in an insertion DNA construct.In some embodiments the selectable marker may be excised out of the hostcell. (Cherepanov et al. (1995) Gene 158:9-14).

Additionally transformants may be analyzed to verify the integration ofthe regulatory DNA constructs, such as artificial promoters usingvarious techniques. The regulatory DNA constructs including artificialpromoters may be PCR verified using oligonucleotides outside therecombinase region. In one example the size of the PCR product obtainedfrom the artificial promoter is compared to the size of the PCR productobtained from the reference promoter on an agarose gel. The regulatoryDNA constructs may be verified by digesting the PCR product obtainedfrom the artificial promoter with a restriction enzyme that is unable todigest the artificial promoter and that is able to digest the referencepromoter. The regulatory DNA constructs may also be verified byevaluating gene expression and production. Many assays are known formeasuring enzyme activity. For example beta-galactosidase is the enzymeproduced by the lacZ gene, and the activity of this enzyme may bedetermine by the assay disclosed in Miller, J. H., A SHORT COURSE INBACTERIAL GENETICS. Cold Spring Harbor Laboratory Press, 1992,Additionally, the artificial promoter region and other regulatoryregions in a host cell may be sequenced by means well known in the art.(Maxam et al., (1977) PNAS USA 74:560-564) Transformed host cellsaccording to the invention may have expression levels of a gene ofinterest which may be higher or lower that the expression level of thecoding region of the gene in a parent control. In one embodiment thelevel of gene expression in a transformed host will be between about 1to 500%, between about 1 to 250%, between about 5 to 200%, between about10 to 150% and between about 10 to 100% of the level of expression ofthe same gene in the corresponding parent. Also about 5%, 15%, 25%, 35%,45%, 55%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 100%, 120%, 140%, 160%,180% and 200% the expression level of a corresponding parent.

Using a DNA library according to the invention, which includes anartificial promoter library, a modified RBS library, a mRNA stabilizingsequence library, or a start codon library or combinations thereof tocreate a population of bacterial cells having varying levels ofexpression of a gene of interest, is particularly useful in a metabolicengineering pathway framework.

A metabolic pathway is a series of chemical reactions that either breakdown a large molecule into smaller molecules (catabolism) or synthesizemore complex molecules from smaller molecules (anabolism). Most of thesechemical reactions are catalyzed by a number of enzymes. In manymetabolic pathways there are rate-limiting enzymatic steps which serveto regulate the pathway. For example, in the glycolytic pathway whereinglucose is converted to pyruvate and ATP, phosphofructokinase isconsidered a key enzyme in regulation and in the pentose phosphatepathway wherein NADPH and ribose-5-phosphate are generated,glucose-6-phosphate dehydrogenase and fructose 1,6-diphosphatase areconsidered key enzymes.

In order to be commercially viable a chemical or protein must be capableof being produced and recovered in large quantities in an organism withlow cultivation cost. Many industrial bioprocesses utilize whole-cellfermentation techniques. In many instances, the use of an isolatedenzyme system is too expensive or impractical. Many enzymes, such asdehydrogenases that may be utilized to carry out chiral synthesis ofpharmaceutical intermediates, require co-factors such as NAD(P) fortheir reactions. Cofactors are utilized stoichiometrically during thereaction and must be repeatedly added to the reaction mixture or thereaction must regenerate the cofactor. A whole-cell system provides analternative for many of these enzymes. Other enzymes may bemembrane-bound or require complex subunit or multi-enzyme complexes(such as cytochrome P-450s), allowing for simpler implementation using awhole-cell system. Finally, the synthesis of complex molecules such assteroids, antibiotics, and other pharmaceuticals may require complicatedand multiple catalytic pathways.

In an isolated system, each step in a particular metabolic pathway wouldneed to be engineered. In contrast, the organism utilized in a wholecell system provides each of the required pathways. However, the use ofcertain promoters may incur problems, such as being too strong. As aresult, overexpression of a particular gene may occur and be detrimentalto a cell. The cell's viability can thus be reduced and the productiontime may be limited.

The methods provided herein are utilized to provided a library ofregulatory DNA constructs such as a library of modified promoters, alibrary of modified RBS and, a library of modified start codons, whichmay include stabilizing mRNA sequences to be introduced into bacterialhost cells which results in a population of transformed cells having arange of gene expression. The range of gene expression is useful becauseit allows the selection of specific bacterial clones having an optimumlevel of expression but still maintaining cell viability (e.g. the fluxproduction of the desired end product relative the viability of the hostcell in sustaining the desired level of production or sustaining thedesired level of production). In certain embodiments the optimum levelof expression of a gene will be high and in other embodiments theoptimum level of gene expression will be low. In one embodiment, thelevel of expression of a gene of interest in a clone library may rangefrom −100 to +500%, also −50 to 150% and ˜0 to 100%. For example, theexpression of a gene of interest in certain clones of a library may be100% less than the expression of the gene in a corresponding parent.Also, the expression of the gene of interest in certain clones may be500% greater than the expression of the same gene in the correspondingparent.

A direct advantage of this method is that a bacterial clone may beselected based on the expression level obtained from the DNA librariesand then be ready for use in a fermentation process whereby cellviability is not negatively affected by expression of the gene ofinterest.

The following Examples are for illustrative purposes only and are notintended, nor should they be construed as limiting the invention in anymanner. Those skilled in the art will appreciate that variations andmodifications can be made without violating the spirit or scope of theinvention.

EXAMPLES

The E. coli strain MG1655 having ATCC No. 47076 was utilized to create alibrary of bacterial clones comprising a library of artificialpromoters, a library of mRNA stabilizing sequences and a library ofmodified RBSs.

Example 1 Creation of a Library of Escherichia coli Clones withDifferent Levels of Expression of a Chromosomal Gene by Deleting aRegulator and Replacing the Natural Promoter by PCR Generated ArtificialPromoters of Different Strength

This example describes the deletion of lad encoding a repressor and thereplacement into the Escherichia coli genome of the natural lacZ(encoding the 1′-galactosidase) promoter by PCR generated artificialpromoters of different strength.

a) Design of the Oligonucleotides for the lacZ Promoter Replacement.

Oligonucleotides (lacZF and degenerated lacZR) were designed to amplifyby PCR a cassette containing an 79 bp sequence homologous to the 5′ ofthe lad gene, a chloramphenicol-resistance encoding gene (cat) flankedby baker yeast FRT sites, a library of three artificial GI promotersequences (FIG. 6) and a 40 bp sequence homologous to the downstreamregion of the +1 transcription start site of the natural lacZ promoter.

The degenerated lacZR primers were 100 nucleotides long and included theentire sequence from the +1 of the transcription start site to the ATGof lacZ (365529 to 365567).

LacZR Oligonucleotide:TAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTAGTGGTTGAATTATTTGCTCAGGATG (SEQID NO. 57) TGGCATHGTCAAGGGCATATGAATATCCTCCTTAG wherein H is A, C or T

The GI promoters from 4 bp upstream of the −35 to 8 bp downstream the−10, were degenerated at the last base of the −35 (TTGACA, TTGACT andTTGACG) to create the diversity. The priming site for pKD3 (Datsenko andWanner, (2000) PNAS, 97: 6640-6645) an R6K plasmid containing the catgene flanked by two FRT sites.

The lacZF primer is 100 nucleotides long (SEQ ID NO. 58) and contains:79 bp of sequence (from 366734 to 366675) at the 5′ end of the lad geneand the priming site for pKD3

LacZF Oligonucleotide:GTGAAACCAGTAACGTTATACGATGTCGCAGAGTATGCCGGTGTCTCTTATCAGACCGTTTCCC (SEQ IDNO. 58) GCGTGGTGAACCAGGGTGTAGGCTGGAGCTGCTTCGb) Amplification and Purification of the GI Promoter ReplacementCassettes.

Primers lacZF and lacZR were used to amplify the library of promoterreplacement cassettes using plasmid pKD3 as a template. Theamplification used 30 cycles of 94° C. for 2 minutes; 60° C. for 30 sec;72° C. for 2 min using Taq polymerase as directed by the manufacturer(BioLabs, New England). The mixture of 1.15 kb PCR products were gelpurified using the Quiaquick gel extraction kit (QIAGEN, Inc.).

c) Creation of the Library of Clones with Different Artificial Promoterin from of the lacZ Genes.

Transformants carrying Red Helper plasmid (pKD 46) (Datsenko and Wanner,supra) were grown in 20 ml SOB medium with carbenicillin (100 mg/l) andL arabinose (10 mM) at 30° C. to an OD _(550nm) of 0.6 and then madeelectrocompetent by concentration 100 fold and washed one time with icewater and twice with ice cold 10% glycerol. Electroporation was doneusing a Gene pulse (BioRad—model II apparatus 165-2106) with a voltagebooster and 0.2 cm chamber according to manufactures instructions byusing 50 μl of cells and 0.1 to 1.0 μg of the mixtures of purified PCRproducts (as described above). Shocked cells were added to 1 ml SOCmedium incubated 2 hours at 30° C. and then half of the cells werespread on agar to select Cm^(R) transformants. Xgal 40 mg/l was added onthe agar plates to evaluate the β galactosidase expression. If cells didnot grow within 24 hours, the remainder were spread after standingovernight at 30° C.

d) PCR Verification of the Transformants.

Mutants were grown overnight on LB medium with 30 mg/l Cm. 1 ml ofculture was washed with ice cold water and the chromosomic DNA wasrecovered in the supernatant after heat treatment (5 min at 94° C.) ofthe washed cells. The PCR was performed using the chromosomic DNA and aset of two oligonucleotides (LacseqF and LacseqR). The amplification wasperformed as disclosed above. A 1.6 PCR product was obtained. LacseqFoligonucleotide GGCTGCGCAACTGTTGGGAA (SEQ ID NO. 59) LacseqRoligonucleotide CATTGAACAGGCAGCGGAAAAG (SEQ ID NO. 60)

The PCR product was digested by ECORV (1 U/μg of ECORV, 2 hrs at 37°C.). The comparison of the digestion profile of the mutants (modifiedprecursor) with the wild-type strain showed that the ECORV is absentwhen the promoter is replaced.

The sequence of the P_(GI) in the different clones was determined bysequencing the different 1.2 kb PCR products with the lacseqF primer. 50μl of column purified PCR products (Quiaquick, Quiagen, Inc.) obtainedfrom the chromosomic DNA of the mutants were used and sequenced byGenome Express (Meylan, France).

The organization of the GI lacZ promoter region in the three types ofrecombinant clones obtained is shown in FIG. 6. As expected, they onlydiffer by one base pair in their −35 region and were named 1.6 GI lacZfor TTGACA, 1.5 GI lacZ for TTGACT and 1.20 GI lacZ for TTGACG.

e) β Galactosidase Activity

A 25 ml LB culture with Cm (30 mg/l) of the mutants was maintained for 5hr at 37° C. The cells were centrifuged 10 min at 4000 g and resuspendedin 300 μl of B-PER Bacterial Protein Extraction Reagent (Pierce,Rockford). After 10 min of incubation on ice, the solution wascentrifuges 2 min at 12000 g at 4 C to separate the soluble proteinsfrom cell debris. The supernatant was used to evaluate the βgalactosidase activity. The β galactosidase activity was measured usingsynthetic substrate ONPG (ortho-nitrophenyl β-D-galactopyranoside)according to the procedure of Miller, (1992) A SHORT COURSE IN BACTERIAGENETICS, Cold Spring Harbor Laboratory Press. The conditions of thereaction were, 37 C, pH 7.3, A 410 nm, light path 1 cm. (FIG. 7)

f) Elimination of the Antibiotic Resistance Gene:

pCP20 (Cherepanov et al., (1995) Gene: 158:9-14) is a plasmid thatcarries an ampicillin resistance marker, contains a temperaturesensitive origin of replication and thermal induction of FLP synthesis.CmR mutants were transformed (pCP20) and ampicillin resistanttransformants were selected at 30° C. A few colonies were purifiedselectively at 43° C. and then tested for loss of all antibioticresistance. The majority lost the FRT flanked resistance gene and theFLP helper plasmid simultaneously.

Example 2 Creation of a Library of Escherichia coli Clones withDifferent Levels of Expression of a Chromosomal Gene by Replacing theNatural Promoter with the 1.6GI and Creating a Library of RBS with PCRGenerated Linear DNA Fragments

This example describes the deletion of lacl and the replacement into theEscherichia coli genome of the natural lacZ (encoding theβ-galactosidase) promoters and RBS by a PCR generated artificialpromoter and RBS with different binding capacities.

a) Design of the oligonucleotides to create a library of replacementCassettes to Replace the Native Promoter and Modify the RBS and theStart Codon.

Oligonucleotide lacZRT was designed to amplify by PCR when used withlacZF a cassette containing a 79 bp sequence homologous to the 5′ of thelacl gene, a chloroamphenicol resistance encoding gene (cat) flanked bybaker yeast FRT sites, the 1.6GI promoter sequence (SEQ ID NO. 19) and a40 bp sequence homologous to the downstream region of the +1transcription start site of the natural lacZ promoter.      LacZRToligonulceotide (SEQ ID NO. 70)     TAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTAGTGGTTGAATTATTTGCTCAGGATGTGGCATGTCAAGGGCATATGAATATCCTCCTTAG

A degenerate oligonucleotide, lacZRBSR, was designed with a 60 basesregion homologous to lacZ after the start codon and a 40 bases regionhomologous to the lacZRT oligonucleotide. LacZRBS R oligonucleotide (SEQID NO. 61)CAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAATCCGTAATCATGGTCATAGCTGTYTYCTBYKWGAAATTGTTATCCGCTCACAATTA wherein B is T, C or G; K is T orG; Y is C or T; and W is A or T.

This oligonucleotide (SEQ ID NO. 61) is degenerated in the RBS sequence(AAGGAGGAAA, degeneration of the 1^(st) base (A) by a T, 2^(nd) base (A)by a C; 3 rd base (G) by a A; 4th base (G) by an A or C; 7^(th) base (G)by an A and the 9th base (A) by a G.

b) Amplification and Purification of the Replacement Cassettes.

Primers lacZF and lacZRT were used to amplify by PCR the 1.6 GI promoterreplacement cassette using pKD3 as template DNA. The amplification used30 cycles of 94° C. for 2 minutes; 60° C. for 30 sec; 72° C. for 2 minusing Taq polymerase as directed by the manufacturer (BioLabs, NewEngland).

The lacZF and lacZRBSR primers were the used to amplify the library ofreplacement constructs using the 1.6GI promoter replacement cassettecreated above as a template. The amplification used 30 cycles of 94° C.for 2 minutes; 60° C. for 3 sec; 72° C. for 2 min using Taq polymeraseas directed by the manufacturer (BioLabs, New England). The 1.15 kb PCRproducts were gel purified using the Quiaquick gel extraction kit(QIAGEN, Inc.).

c) Creation of a library of lacZ expression levels in Escherichia coliby homologous recombination in the chromosome using replacementcassettes in the form of linear DNA.

Transformants carrying red helper plasmid (pKD 46) (Datsenko and Wanner,supra) were grown in 20 ml SOB medium with carbenicillin (100 mg/l) andL arabinose (10 mM) at 30° C. to an OD _(550nm) of 0.6 and then madeelectrocompetent by concentration 100 fold and washed one time with icewater and twice with ice cold 10% glycerol. Electroporation was doneusing a Gene pulse (BioRad—model II apparatus 165-2106) according tomanufactures instructions by using 50 μl of cells and 0.1 0 1.0 μg ofthe mixtures of purified PCR products (as described above). Shockedcells were added to 1 ml SOC medium incubated 2 hours at 30° C. and thenhalf of the cells were spread on agar to select CmR transformants. Xgal40 mg/l was added on the agar plates to evaluate the β galactosidaseexpression. If cells did not grow within 24 hours, the remainder werespread after standing overnight at 30° C.

d) PCR verification of the transformants.

Mutants were grown overnight on LB medium with 30 mg/l Cm. 1.0 ml ofculture was washed with ice cold water and the chromosomic DNA wasrecovered in the supernatant after heat treatment (5 min at 94° C.) ofthe washed cells. The PCR was performed using the chromosomic DNA andthe two oligonucleotides, LacseqF and LacseqR as disclosed above inexample 1. Amplification also followed the protocol of example 1. A 1.6kb PCR product was obtained. The PCR product was digested by ECORV (1U/μg of ECORV, 2 hrs at 37° C.). The comparison of the digestion profileof the mutants with the wild-type strain showed that the ECORV site isabsent when the promoter is replaced.

The sequence of the replacement cassette in the different clones wasdetermined by sequencing the different 1.6 kb PCR products with thelacFprimer. 50 μl of column-purified PCR products (Quiaquick, Quiagen,Inc.) obtained from the chromosomic DNA of the mutants were used andsequenced by Genome Express (Meylan, France).

Eight of the recombinant clones were designated as indicated below andthe organization of the upstream region of lacZ in each recombinantclone is A=CAAGGAGGAA ACAGCTATG (SEQ ID NO.22), B=CAAGAAGGAA ACAGCTATG(SEQ ID NO. 23), C=CACACAGGAA ACAGCTATG (SEQ ID NO. 24), D=CTCACAGGAGACAGCTATG (SEQ ID NO. 25), E=CTCACAGGAA ACAGCTATG (SEQ ID NO. 26),F=CACACAGAAA ACAGCTATG (SEQ ID NO. 27), G=CTCACAGAGA ACAGCTATG (SEQ IDNO. 28), and H=CTCACAGAAA ACAGCTATG (SEQ ID NO. 29).

As expected the transformants differed only by RBS and the range ofexpression among the different clones of the library was from 5.7 to0.02 U/mg of protein (FIG. 8).

Elimination of the antibiotic resistance gene was performed as disclosedin example 1.

Example 3 Creation of a Library of Escherichia coli Clones withDifferent Levels of Expression of a Chromosomal gene by Both Replacingthe Native Promoter by the 1.6 GI Promoter and Introducing mRNAStabilizing Structures Using a Library of PCR Generated Linear DNAFragments

This example describes the deletion of lad and the replacement into theEscherichia coli genome of the natural lacZ (encoding theβ-galactosidase) promoter and the lac operator by PCR generatedartificial promoters of different strength and artificial mRNAstabilizing structures with different efficiencies.

a) Design of the oligonucleotides to create a library of replacementcassettes to replace the promoter and the lac operator by a library ofartificial promoters and mRNA stabilizing structures.

To generate broader lacZ expression level, a library of replacementcassettes was designed to remove lac, the natural lacZ promoter and thelac operator and replace them by the 1.6 GI promoter and a library ofmRNA stabilizing structure. For this purpose, a degenerateoligonucleotide, lacZMRNA, was designed with a 43 base region homologousto lacZ downstream the RBS site, 34 bases of mRNA stabilizing structureand a 23 bases region homologous to the lacZRT oligonucleotide upstreamthe +1 of transcription. This oligonucleotide is degenerated in the mRNAstabilizing sequence. LacZmRNA R oligonucleotide (SEQ ID NO. 62)CGACGGCCAGTGAATCCGTAATCATGGTCATAGCTGTTTCCTCCTTCGTCAACAATATCTCACTCGAGATAASTCGASSTAGTGGTTGAATTATTTGCTCAGG, wherein S is C or G.

If lacF and lacMRNA are used in a PCR reaction with the promoterreplacement cassette (generated by PCR using the primers lacZF andlacZRT (SEQ ID NO. 70) as template DNA, a new library will be obtainedwith lad deleted, the promoter replaced and the mRNA stabilizingstructure introduced.

b) Amplification and purification of the replacement cassettes:

Primers lacF and lacZMRNA were used to amplify the library ofreplacement cassettes using the 1.6 GI promoter replacement cassettecreated in example 2 as template DNA. Amplification followed theprocedures of example 1. The 1.15 kb PCR products were purified byagarose gel electrophoresis followed by QIAquick gel extraction Kit(QIAGEN).

c) Creation of a library of lacZ expression level in Escherichia coli byhomologous recombination in the chromosome using replacement cassettesin the form of linear DNA:

Transformants carrying Red Helper plasmid (pKD 46) (Datsenko and Wanner,supra) were grown in 20 ml SOB medium with carbenicillin (100 mg/l) andL arabinose (10 mM) at 30° C. to an OD _(550mn) of 0.6 and then madeelectrocompetent by concentration 100 fold and washed one time with icewater and twice with ice cold 10% glycerol. Electroporation was doneusing a Gene pulse (BioRad—model II apparatus 165-2106) with a voltagebooster and 0.2 cm chambers according to manufactures instructions byusing 50 μl of cells and 0.1 to 1.0 μg of the purified PCR products (asdescribed in b) above). Shocked cells were added to 1 ml SOC mediumincubated 2 hours at 30° C. and then half of the cells were spread onagar to select Cm^(R) transformants. Xgal 40 mg/l was added on the agarplates to evaluate the β galactosidase expression. If cells did not growwithin 24 hours, the remainder were spread after standing overnight at30° C.

d) PCR Verification of the Transformants.

Mutants were grown overnight on LB medium with 30 mg/l Cm. 1.0 ml ofculture was washed with ice cold water and the chromosomic DNA wasrecovered in the supernatant after heat treatment (6 min at 94° C.) ofthe washed cells.

The PCR was performed using the chromosomic DNA and a set of twooligonucleotides, LacseqF and LacseqR as disclosed above in example 1.Amplification also followed the protocol of example 1. A 1.6 kb PCRproduct was obtained. The PCR product was digested by ECORV (1 U/μgECORV, 2 hrs at 37° C.). The comparison of the digestion profile of themutants with the wild-type strain showed that the ECORV site is absentwhen the promoter is replaced.

The sequence of the replacement cassette in the different clones wasdetermined by sequencing the different 1.6 kb PCR products with thelacFprimer. 50 μl of column-purified PCR products (Quiaquick, Quiagen,Inc.) obtained from the chromosomic DNA of the mutants were used andsequenced by Genome Express (Meylan, France).

The organization of the upstream region of lacZ of the recombinantclones is shown in FIG. 9. As expected the range of expression among thedifferent clones of the library was from 4.1 to 18.4 U/mg protein.

Example 4 Creation of a Library of Escherichia coli Clones withDifferent Artificial Promoters, Modified Start Codons and Modified RBSUsing a Library of PCR Generated Linear DNA Fragments

This example describes the deletion of lacl and the replacement into theEscherichia coli genome of the natural lacZ (encoding theβ-galactosidase) promoter, RBS and start codon by PCR generatedartificial promoters of different strength, RBS with different bindingcapacity and start codons of different efficiency.

a) Design of the Oligonucleotides for the lacZ Promoter Replacement.

To generate broader lacZ expression level, a library of replacementcassettes was designed to remove lacl, replace the promoter and modifythe RBS. A degenerate oligonucleotide in RBS and in the start codon,lacZRBSR2 was designed with a 60 base region homologous to lacZ afterthe start codon and a 40 base region homologous to the lacRoligonucleotide.

LacZRBS R2 oligonucleotide (SEQ ID NO. 71)CAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGAATCCGTAATCATGGTCAHAGCTGTYTYCTBYKWGAAATTGTTATCCGCTCACAATTA wherein B is T, C or G; H is A, Tor C; K is T or G; Y is C or T; and W is A or T.

b) Amplification and Purification of the P_(GI) Replacement Cassettes.

Primers lacZF and lacZR were used to amplify the library of promoterreplacement cassettes using plasmid pKD3 as a template as described inexample 1. Primers LacZF and LacZRSB2R were used to amplify the libraryof promoter replacement cassettes with a modified start codon and amodified RBS using the mixture of PCR products obtained above as atemplate. Amplification followed the procedures of example 1. The 1.15kb PCR products were purified by agarose gel electrophoresis followed byQIAquick gel extraction Kit (QIAGEN).

c) Creation of the Library of Clones with Different Artificial Promoterswith Modified Start Codons and is Modified RBS in Front of the lacZGenes.

Transformants carrying Red Helper plasmid (pKD 46) (Datsenko and Wanner,supra) were grown in 20 ml SOB medium with carbenicillin (100 mg/l) andL arabinose (10 mM) at 30° C. to an OD _(550nm) of 0.6 and then madeelectrocompetent by concentration 100 fold and washed one time with icewater and twice with ice cold 10% glycerol. Electroporation was doneusing a Gene pulse (BioRad—model II apparatus 165-2106) with a voltagebooster and 0.2 cm chambers according to manufactures instructions byusing 50 μl of cells and 0.1 to 1.0 μg of the purified PCR products (asdescribed above). Shocked cells were added to 1 ml SOC medium incubated2 hours at 30° C. and then half of the cells were spread on agar toselect Cm^(R) transformants. Xgal 40 mg/l was added on the agar platesto evaluate the β galactosidase expression. If cells did not grow within24 hours, the remainder were spread after standing overnight at 30° C.

d) PCR Verification of the Transformants.

Mutants were grown overnight on LB medium with 30 mg/l Cm. 1.0 ml ofculture was washed with ice cold water and the chromosomic DNA wasrecovered in the supernatant after heat treatment (5 min at 94° C.) ofthe washed cells.

The PCR was performed using the chromosomic DNA and a set of twooligonucleotides, LacseqF and LacseqR as disclosed above in example 1.Amplification also followed the protocol of example 1. A 1.6 kb PCRproduct was obtained. The PCR product was digested by ECORV (1 U/μg ofECORV, 2 hrs at 37 C). The comparison of the digestion profile of themutants with the wild-type strain showed that the ECORV site disappearedwith the promoter replacement.

The sequence of the GI promoter in the different clones was determinedby sequencing the different PCR products with the lacseqF primer. 50 μlof column-purified PCR products (Quiaquick, Quiagen, Inc.) obtained fromthe chromosomic DNA of the mutants were used and sequenced by GenomeExpress (Meylan, France). The organization of the upstream region oflacZ in four of the recombinant clones obtained was as expected.

-   1.6GI—clone 1: start codon—TTG; RBS—TCACAGGAGA; β-galactosidase    activity, 0.28 U/mg;-   1.6GI—clone 2: start codon—ATG; RBS—AAGGAGGAA; β-galactosidase    activity, 5.7 U/mg;-   1.2GI—clone 3: start codon—ATG; RBS—ACACAGGAAA; β-galactosidase    activity, 0.68 U/mg; and-   1.6GI—clone 4: start codon—TTG; RBS—ACACAGAAGA; β-galactosidase    activity, 0.032 U/mg.

Those skilled in the art will recognize or be able to ascertain usingnot more than routine experimentation, many equivalents to the specificembodiments of the invention described herein. Such equivalents areintended to be encompassed by the following claims.

1. A method of creating a library of artificial promoters comprising a)obtaining an insertion DNA cassette, which comprises, a firstrecombinase site, a second recombinase site and a selective marker genelocated between the first and the second recombinase sites; b) obtaininga first oligonucleotide which comprises, i) a first nucleic acidfragment homologous to an upstream region of a chromosomal gene ofinterest, and ii) a second nucleic acid fragment homologous to a 5′ endof the insertion DNA cassette; c) obtaining a second oligonucleotidewhich comprises, i) a third nucleic acid fragment homologous to a 3′ endof said insertion DNA cassette, ii) a precursor promoter comprising a−35 consensus region (−35 to −30), a linker sequence and a −10 consensusregion (−12 to −7), wherein the linker sequence comprises between 14-20nucleotides and is flanked by the −35 region and the −10 region, whereinsaid precursor promoter has been modified to include at least onemodified nucleotide position of the precursor promoter and wherein the−35 region and the −10 region each include between 4 to 6 conservednucleotides of the promoter, and iii) a fourth nucleic acid fragmenthomologous to a downstream region of the transcription start site of thepromoter; and d) mixing the first oligonucleotide and the secondoligonucleotide in an amplification reaction with the insertion DNAcassette to obtain a library of double stranded amplified productscomprising artificial promoters.
 2. The method according to claim 1further comprising purifying the amplified products.
 3. The methodaccording to claim 1, wherein the amplification step is a polymerasechain reaction step.
 4. The method according to claim 1, wherein the −35region of the precursor promoter is selected from the group consistingof TTGACA, TTGCTA, TTGCTT, TTGATA, TTGACT, TTTACA and TTCAAA.
 5. Themethod according to claim 1, wherein the −35 region of the precursorpromoter comprises a modification to the −30 residue of the precursorpromoter.
 6. The method according to claim 1, wherein the −10 region ofthe precursor promoter is selected from the group consisting of TAAGAT,TATAAT, AATAAT, TATACT, GATACT, TACGAT, TATGTT and GACAAT.
 7. The methodaccording to claim 1, wherein the −35 region of the precursor promoteris TTGACA and the −10 region of the precursor promoter is TATAAT.
 8. Themethod according to claim 1, wherein the 35 region of the precursorpromoter is TTGACA and the −10 region of the precursor is AATAAT.
 9. Themethod according to claim 1, wherein the linker sequence comprisesbetween 16 and 18 nucleotides.
 10. The method according to claim 1,wherein the precursor promoter is obtained from a promoter selected fromthe group consisting of P_(trc) (SEQ ID NO 2); P_(D/E20) ((SEQ ID NO.4); P_(H207) (SEQ ID NO. 3); P_(N25) (SEQ ID NO. 5); P_(G25) (SEQ IDNO.6); P_(J5) (SEQ ID NO.7); P_(A1) (SEQ ID NO. 8); P_(A2) (SEQ ID NO.9); P_(A3) (SEQ ID NO. 10); P_(lac) (SEQ ID NO. 1); P_(GI) (SEQ ID NO.15); P_(lacUV5) (SEQ ID NO. 12); P_(CON) (SEQ ID NO.4); and P_(bls) (SEQID NO. 14).
 11. The method according to claim 1, wherein the library ofartificial promoters includes SEQ ID NO. 15, SEQ ID NO. 16 and SEQ IDNO.
 17. 12. The method according to claim 1, wherein the precursorpromoter and the chromosomal gene of interest are heterologous.
 13. Themethod according to claim 1, wherein the precursor promoter and thechromosomal gene of interest are homologous.
 14. The method according toclaim 1 further comprising modifing the ribosome binding site including,d) obtaining a third oligonucleotide which comprises, i) a fifth nucleicacid fragment homologous to the 5′ end of said chromosomal gene ofinterest, ii) a modified ribosome binding site of the gene of interest,said ribosome binding site includes at least one modified nucleotide,and iii) a sixth nucleic acid fragment homologous to a downstream regionof the −10 region of the second oligonucleotide; and e) mixing the PCRproducts of claim 1 with the third oligonucleotide and the firstoligonucleitde of claim 1 in a PCR reaction to obtain PCR productscomprising artificial promoters with modified ribosome binding sites.15. The method according to claim 14, wherein the ribosome binding sitefrom the precursor promoter is selected from the group consisting ofAGGAAA, (SEQ ID NO. 30), AGAAAA (SEQ ID NO. 31), AGAAGA (SEQ ID NO. 32),AGGAGA (SEQ ID NO. 33), AAGAAGGAAA (SEQ ID NO. 34), AAGGAAAA (SEQ ID NO.35), AAGGAAAG (SEQ ID NO. 36), AAGGAAAU (SEQ ID NO. 37), AAGGAAAAA (SEQID NO. 38), AAGGAAAAG (SEQ ID NO. 39), AAGGAAAAU (SEQ ID NO. 40),AAGGAAAAAA (SEQ ID NO. 41), AAGGAAAAAG (SEQ ID NO. 42), AAGGAAAAAU (SEQID NO. 43), AAGGAAAAAAA (SEQ ID NO. 44), AAGGAAAAAAG (SEQ ID NO. 45),AAGGAAAAAAU (SEQ ID NO. 46), AAGGAAAAAAAA (SEQ ID NO. 47), AAGGAAAAAAAG(SEQ ID NO. 48), AAGGAAAAAAAU (SEQ ID NO. 49), AAGGAAAAAAAAA (SEQ ID NO.50), AAGGAAAAAAAAG (SEQ ID NO. 51), AAGGAAAAAAAU (SEQ ID NO. 52),AAGGAAAAAAAAAA (SEQ ID NO. 53), AAGGAAAAAAAAAG (SEQ ID NO. 54),AAGGAGGAAA (SEQ ID NO. 55), and AAGGAAAAAAAAAU (SEQ ID NO. 56).
 16. Themethod according to claim 14 further comprising inserting a stabilizingmRNA sequence between the modified ribosome binding site and atranscription initiation site of the third oligonucleotide.
 17. Themethod of claim 14, further comprising altering the start codon of thegene of interest in the third oligonucleotide.
 18. The method accordingto claim 1 further comprising, d) obtaining a third oligonucleotidecomprising i) a fifth nucleic acid fragment homologous to the 5′ end ofthe chromosomal gene of interest in claim 1, ii) a start codon of thegene of interest, wherein said start codon is degenerated and includesat least one modification oligonucleotide and iii) a sixth nucleic acidfragment homologous to the downstream region of the −10 region of thesecond oligonucleotide, and e) mixing the PCR products of claim 1 withthe third oligonucleotide and the first oligonucleotide in a PCRreaction to obtain PCR products comprising artificial promoters withmodified start codons.
 19. The method according to claim 17 furthercomprising inserting a stabilizing mRNA sequence between the −10 box ofthe artificial promoter and a transcription initiation site of the thirdoligonucleotide.
 20. The artificial promoter library produced by themethod of claim
 1. 21. The artificial promoter library produced by themethod of claim
 2. 22. An artificial promoter library comprising amixture of double stranded polynucleotides which include in sequentialorder: a) a nucleic acid fragment homologous to an upstream region of achromosomal gene of interest, b) a first recombinase site, c) a nucleicacid sequence encoding an antimicrobial resistance gene, d) a secondrecombinase site, e) two consensus regions of a promoter and a linkersequence, wherein the first consensus region comprises a −35 region, thesecond consensus region comprises a −10 region and the linker sequencecomprises at least 14-20 nucleotides and is flanked by the firstconsensus region and wherein the −35 region and the −10 region eachinclude between 4-6 conserved nucleotides of corresponding consensusregions of the promoter, and f) a nucleic acid fragment homologous tothe downstream region of the +1 transcription start site of thepromoter.
 23. The artificial promoter library of claim 22, wherein thedouble stranded polynucleotides further include a modified ribosomebinding site of the promoter wherein said binding site is locatedbetween the −10 region and the nucleic acid sequence homologous to thedownstream region of the +1 transcription start site.
 24. The artificialpromoter library of claim 22, wherein the double strandedpolynucleotides further include a modified start codon, wherein themodified start codon sequence is located between the −10 region and thenucleic acid sequence homologous to the downstream region of the +1transcription start site.
 25. The artificial promoter library of claim22, wherein the double stranded polynucleotides further include astabilizing mRNA nucleic acid sequence, wherein the stabilizing mRNAsequence is located between the −10 region and the nucleic acid sequencehomologous to the downstream region of the +1 transcription start site.26. The artificial promoter library of claim 22, wherein the −35 regionincludes a substitution in one nucleotide position with the remainingnucleotide positions conserved.
 27. The artificial promoter library ofclaim 26, further including a substitution in one nucleotide position ofthe −10 region with the remaining nucleotide positions conserved.
 28. Amethod of modifying a promoter in selected host cells comprising a)obtaining a library of PCR products comprising artificial promotersaccording to claim 1; b) transforming bacterial host cells with the PCRlibrary, wherein the PCR products comprising the artificial promotersare integrated into the bacterial host cells by homologousrecombination; c) growing the transformed bacteria cells; d) selectingthe transformed bacterial cells comprising the artificial promoters. 29.A method of modifying a promoter in selected host cells comprising a)obtaining a library of PCR products comprising artificial promotersaccording to claim 14; b) transforming bacterial host cells with the PCRlibrary, wherein the PCR products comprising the artificial promotersare integrated into the bacterial host cells by homologous recombinationto produce transformed bacterial cells; c) growing the transformedbacteria cells; d) selecting the transformed bacterial cells comprisingat least one artificial promoter.
 30. A method of modifying a promoterin selected host cells comprising a) obtaining a library of PCR productscomprising artificial promoters according to claim 18; b) transformingbacterial host cells with the PCR library, wherein the PCR productscomprising the artificial promoters are integrated into the bacterialhost cells by homologous recombination to produce transformed bacterialcells; c) growing the transformed bacteria cells; d) selecting thetransformed bacterial cells comprising at least one artificial promoter.31. The method according to claim 28, wherein the bacterial host cell isselected from the group consisting of E. coli, Pantoea sp. and Bacillussp.
 32. The method according to claim 29, wherein the bacterial hostcell is selected from the group consisting of E. coli, Pantoea sp. andBacillus sp.
 33. The method according to claim 30, wherein the bacterialhost is selected from the group consisting of E. coli, Pantoea sp. andBacillus sp.
 34. A method of creating a library of bacterial cellshaving a range of expression levels of a chromosomal gene of interestcomprising, a) obtaining a library of PCR products comprising artificialpromoters according to claim 1; b) transforming bacterial host cellswith the PCR products, wherein the PCR products comprising theartificial promoters are integrated into bacterial host cells byhomologous recombination to produce transformed bacterial cells; c)growing the transformed bacteria cells; and d) obtaining a library oftransformed bacterial cells wherein the library exhibits a range ofexpression levels of a chromosomal gene of interest.
 35. The methodaccording to claim 34, further comprising selecting transformedbacterial cells from the library.
 36. The method of claim 35, whereinthe selected transformed bacterial cells have a low level of expressionof the gene of interest.
 37. The method of claim 35, wherein theselected transformed bacterial cells have a high level of expression ofthe gene of interest.
 38. The method according to claim 35 furthercomprising excising the selective marker gene from the transformedbacterial cells.
 39. Transformed bacterial cells selected according tothe method of claim
 35. 40. The method according to claim 35, whereinthe bacterial host cell is an E. coli, Bacillus sp. or Pantoea sp. cell.